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Abstract 

Hypothesis tests for the presence of new sources of Poisson counts amidst back- 
ground processes are frequently performed in high energy physics (HEP), gamma 
ray astronomy (GRA), and other branches of science. While there are conceptual 
issues already when the mean rate of background is precisely known, the issues are 
even more difficult when the mean background rate has non-negligible uncertainty. 
After describing a variety of methods to be found in the HEP and GRA literature, 
we consider in detail three classes of algorithms and evaluate them over a wide range 
of parameter space, by the criterion of how close the ensemble-average Type I error 
rate (rejection of the background-only hypothesis when it is true) compares with the 
nominal significance level given by the algorithm. We recommend wider use of an al- 
gorithm firmly grounded in frequentist tests of the ratio of Poisson means, although 
for very low counts the over-coverage can be severe due to the effect of discreteness. 
We extend the studies of Cranmer, who found that a popular Bayesian-frequentist 
hybrid can undercover severely when taken to high Z values. We also examine the 
profile likelihood method, which has long been used in GRA and HEP; it provides 
an excellent approximation in much of the parameter space, as previously studied 
by Rolke and collaborators. 
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1 Introduction 



The incorporation of systematic uncertainties into hypothesis tests (and by 
implication into confidence intervals and limits) remains a murky area of data 
analysis in spite of much study in the professional statistics community and in 
high energy physics, in gamma ray astronomy, and in other branches of science 
[1]. Exact methods using the frequentist definition of probability typically do 
not exist, while purely Bayesian methods, as commonly used in high energy 
physics, invoke uniform priors which make the resulting probability statements 
hard to interpret if not completely arbitrary. 

The foundational issues already arise in startlingly simple prototype problems 
such as the one that we examine in this paper: n on events are observed from 
the Poisson process with mean /i s + /i h , where /i s is the unknown parameter 
of interest (the mean number of signal events), while /ib is the mean number 
of background events (mimicking signal events), measured to have a value 
/tb with some uncertainty from subsidiary observations. One wishes to test 
the hypothesis H that /i s = 0, i.e., that the observed number of events is 
statistically consistent with being all background. In this paper, we focus on 
the significance level a of the hypothesis test, also known as the size of the 
test, and in particular consider the very small values of a corresponding to a 
statistical significance of up to five standard deviations. In the formal theory 
of Neyman-Pearson hypothesis testing, a is specified in advance; once data 
are obtained, the p-value is the smallest value of a for which H would be 
rejected. In a real application, the power of the test, which depends on the 
alternative hypothesis, should be considered as well, but we do not explore 
that complementary aspect of the test here [2]. Also, we do not address the 
complex issue of the utility of p-values, which is discussed by Berger and 
others (e.g., Refs. [3,4]); we merely remind the reader that at best, a p-value 
conveys the probability under H of obtaining a value of the test statistic at 
least as extreme as that observed, and that it should not be interpreted as the 
probability that Hq is true. Having said that, given the ubiquity of p- values in 
the literature, we study in detail the efficacy of three methods for calculating 
p-values in the presence of systematic uncertainties. 

Frequently the p-value is communicated by specifying the corresponding num- 
ber of standard deviations in a one-tailed test of a Gaussian (normal) variate; 
i.e., one communicates a Z- value (often called S in HEP) given by 

Z = $- 1 (l-p) = -$- 1 (p) (1) 
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where 



-OO 



so that 

Z = v / 2err 1 (l-2p). (3) 

Thus, for example, Z = 3 corresponds to a p- value of 1.35 x 10~ 3 . This relation 
can be approximated to better than 1% for Z > 1.6 as 



Z a: Vw — In it, (4) 



where u = — 21n(pv27r). (See Appendix B.) This form fortuitously is much 
more accurate than directly inverting the full asymptotic expansion to second 
or third order. Asymptotically, Z goes as y/— In p at large Z. 

If the uncertainty on fi^ vanishes (so that fa = /it,), some controversy exists 
as to the best way to proceed, but at least in that case there seems to be some 
clarity about the different methods, their performance, and their merits and 
demerits. In contrast, if the uncertainty on (x^ is non-negligible, then the nature 
of the subsidiary measurement of yU b becomes crucial, and the interpretation 
of results of various recipes (algorithms for computing the p-value) becomes 
much more difficult. We take a pragmatic point of view that the performance 
of a recipe is of more interest than the foundational solidity of the recipe, and 
evaluate this performance by the frequentist criterion of how well the nominal 
significance level of a test corresponds to the true frequency of Type I errors 
(rejecting H when it is true). 

As in Ref. [5], we consider two variations of this prototype problem (described 
in Sec. 2), which differ in the specification of the subsidiary measurement of 
/i b . In the first case, it is a (typically small-integer) Poisson measurement in 
a signal-free control region, and in the second case it is a Gaussian (normal) 
measurement with known rms deviation. Section 3 describes the little-used 
fact [5,6] that the standard frequentist solution to the ratio-of-Poisson-means 
problem can be directly applied to the first prototype problem at hand, which 
makes evaluation of Z easy with modern software tools. In Sec. 4, we outline 
the frequentist-Bayesian hybrid which is commonly used in HEP, noting its 
lack of foundational solidity and ambiguity due to choice of the Bayesian 
prior. We note the remarkable mathematical connection between one choice 
of prior and the frequentist solution of Sec. 3. In Sec. 5, we explore the profile 
likelihood method (well-known in HEP as the MINUET MINOS method [7] and 



in gamma ray astronomy (GRA) as popularized by Li and Ma [8]), which gives 
approximate results based on likelihood ratios. In Sec. 6, we briefly describe 
other methods, and in Sec. 7 we compare some results obtained with all the 
methods. 

In the remaining sections we focus on the three main methods introduced in 
Sees. 3-5, and study in detail the relations among the computed Z values and 
the Type I error rates, as one spans the space of true values of the parameters. 
We conclude in Sec. 10 that the little-used frequentist solution should have 
much broader use, and we even advocate its prudent use in the second proto- 
type problem, in which it applies only via a rough correspondence. As found in 
Refs. [9,10], (which advocate some modifications) the profile likelihood method 
provides remarkably good results over a wide range of parameters. Given the 
richness of results even for these simple prototype problems, there remains 
much work to be done, beyond the scope of this paper, in exploring per- 
formance of other recipes and further generalizations to more complicated 
problems [1,11-13]. 

Appendix A contains a summary of our notation. Appendix B has a deriva- 
tion of Eqn. 4, followed in Appendix C by a proof of the "remarkable con- 
nection" mentioned above. Calculational details of the various Z-values are in 
Appendix D, and some implementation examples are in Appendix E. 



2 Two prototype problems differing in the measurement of fi^, 

2.1 The on/ 'off problem 

In the first prototype problem, which we refer to as the "on/off" problem, the 
subsidiary measurement of /ib consists of the observation of n g events in a 
control region where no signal events are expected. In HEP, the control region 
is commonly referred to as a "sideband" since it is typically a sample of events 
which is near the signal region in some measured parameter, i.e., in a band 
of that parameter alongside but disjoint from the parameter values where the 
signal might exist. 

This HEP prototype problem has an exact analog in gamma ray astronomy 
(GRA), upon which we base our notational subscripts "on" and "off". The 
observation of n on photons when a telescope is pointing at a potential source 
( "on-source" ) includes both background and the source, while the observation 
of n s photons with the telescope pointing at a source-free direction nearby 
("off-source") is the subsidiary measurement. In both the HEP and GRA 
examples, we let the parameter r denote the ratio of the expected means of 
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n g and n on under H , i.e., when /i on = /i^ 



(5) 



In GRA, r in the simplest case is the ratio of observing time off/on source 
(subject to corrections in more complicated cases), while in HEP the calcu- 
lation of r might involve background shapes, efficiencies, etc., determined by 
Monte Carlo simulation. In the prototype problems studied in detail in this 
paper we assume that r itself is known exactly or with negligible uncertainty. 
Thus, since the point estimate of /x fr is n g, the point estimate of /ib is 

^b = n oS /T. (6) 

2.2 The Gaussian-mean background problem 

In a second prototype problem, which we refer to as the "Gaussian-mean back- 
ground" problem, the subsidiary measurement of /ib is assumed to be drawn 
from a Gaussian (normal) probability density function (pdf) with rms devia- 
tion <7b . We emphasize that while the measurement of the background mean 
has a Gaussian pdf, the number of background counts obeys Poisson statis- 
tics according to the fixed but unknown true background mean as described 
above. In this paper, we consider two cases, one in which o"b is known abso- 
lutely, and one in which o"b is known to be a fraction / of /i b , and therefore the 
experimenter estimates cr b by //ib in analyzing the data from an experiment. 

2.3 Correspondence between the two problems 

These two problems have an approximate correspondence since a rough esti- 
mate of the uncertainty in estimating /i g by n g is ^n Q s, so that a rough 
estimate of the uncertainty on /ib in the first problem is y/n^/r. Thus, the 
correspondence is 

o-b = y/n^/r, (7) 
which when combined with Eqn. 6 yields 



We emphasize that in using this rough correspondence in equations, one takes 
both conceptual and numerical liberties. Nonetheless, it is useful to study the 
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pragmatic consequences of transferring recipes between the two prototype 
problems based on the correspondence in Eqns. 7 and 8, while of course keep- 
ing in mind the lack of firm foundation. 



3 Prequentist solution to the on/off problem 



The on/off problem above maps exactly onto one of the classic problems in 
statistics, namely that of constructing hypothesis tests for the ratio of Poisson 
means (solved by Przyborowski and Wilenski [14]). Each of n on and n g is 
a sample from a Poisson probability with unknown means /i on and p Q s] the 
background-only hypothesis Hq is therefore that the ratio of Poisson means 
A = /ioff//-ton is equal to the corresponding ratio with background only, r. 

The joint probability of observing n on and n fj is the product of Poisson proba- 
bilities for n on and n Q g, and can be rewritten as the product of a single Poisson 
probability with mean p tot = p n + p s for the total number of events n tot , and 
the binomial probability that this total is divided as observed if the binomial 
parameter p is p = p on / p to t = 1/(1 + A): 



P(n on , n oS ) = ^ x 

n on ! n off ! 



"tot! 

! 



(9) 



p n ° n (1 - p ) ("**-"»). (10) 



"on! ("tot - n on )! 

That is, rewriting in terms of observables (" on ,"tot) and parameters (\,p tot ): 



^"on> "off i A'-on) A'-off "on | "tot i P) (11) 

— P ("tot ; A^tot ) P ("on | "tot ; 1/(1 + A)), (12) 

where on the right-hand side the probabilities P are Poisson and binomial, 
respectively. In this form, all the information about the ratio of Poisson means 
A (and hence about H ) is in the conditional binomial probability for the 
observed "successes" n on , given the observed total number of events n tot = 
"on + "off- In the words of Reid [15], ". . . it is intuitively obvious that there is 
no information on the ratio of rates from the total count. . .". The same result 
was obtained in the HEP community by James and Roos [16] and in the GRA 
community by Gehrels [17]. Therefore one simply uses n on and n tot to look up 
a standard hypothesis test result for the binomial parameter p, and rewrites 
it in terms of r and hence Hq. To be more explicit, in the notation thus far 
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H can be variously expressed as: p s = 0; p on = Pb] Pos/Pon = r; A = r; or as 
most relevant here, p = 1/(1 + r). In the last form, the standard frequentist 
binomial parameter test can be used; this dates back to the first construction 
of confidence intervals for a binomial parameter by Clopper and Pearson in 
1934 [2,18]. 

The p- value for the test of p — l/(l+r), and hence of Hq, is then the one-tailed 
probability sum: 

ntot 

PBi = J2 p U\ n tot;p)- (13) 

j=n on 



This can be computed from a ratio of incomplete and complete beta functions 
(both denoted by B and distinguished by the number of arguments): 

p B i = B(p,n on , 1 + n oS )/B(n on , 1 + n off ). (14) 



The corresponding Z-value, Zbu then follows using Eqn. 3. This ratio in 
Eqn. 14 is itself called "the" incomplete beta function in Numerical Recipes 
[19], which contains an algorithm for calculating it. This algorithm is imple- 
mented in the analysis software package ROOT [20]; examples of the ROOT 
implementation are in Appendix E. This implementation, however, runs into 
numerical troubles for large values of its parameters; for the calculations in 
this paper we use a different implementation of the incomplete beta function 
due to Majumder and Bhattacharjee [21], which exhibits good precision over 
the parameter space studied here. 

As reviewed by Cousins [22], the above construction for tests of the ratio of 
Poisson means (or equivalently, confidence intervals for the ratio of Poisson 
means) is used broadly in science and engineering. This use of conditional 
binomial probabilities in a problem with discrete observations is discussed in 
Ref. [22], which observes that these need not correspond to uniformly most 
powerful unbiased tests, since the theorem of Lehmann and Scheffe assumes 
continuous observables. Ref. [22] constructs a set of binomial confidence in- 
tervals which are subsets of the standard ones (and therefore at least as short 
in any metric). However, use of such intervals remains controversial because 
of the importance with which conditioning is regarded in statistical inference 
[15], as also discussed in Ref. [22]. For the demonstrations in this paper, we use 
the standard set, which is more conservative, particularly for small numbers 
of counts, due to the discreteness. 

Remarkably, while the ratio-of-Poisson-means problem and solution are widely 
known, its straightforward application to the central problem of this paper 
seems to have escaped both the GRA and HEP communities, except for the 
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1990 paper by Zhang and Ramsden [6] in GRA and the recent paper by one 
of us [5] , which is the only paper we know of that cited Zhang and Ramsden. 



4 Bayesian-frequentist hybrid recipes for the two problems 



Recipes which combine Bayesian-style averaging with frequentist calculation 
of tail-integral probabilities may have intuitive appeal and some adherents in 
the professional statistics community [23,24], but such mixing of paradigms 
must be viewed with care: either one is introducing the foreign notion of a pdf 
of an unknown true value into a frequentist calculation, or one is introducing 
the foreign notion of a tail probability (i.e., probability of obtaining data 
not observed) into a Bayesian calculation. Once a hybrid method is used to 
calculate a p- value or a Z- value significance, then it is by definition attempting 
a frequentist claim and is appropriate to evaluate it by those standards, and 
in particular to test if the true Type I error rate of the method is consistent 
with claimed significance levels: if not, this is a weakness of the method. 

Thus, the properties of such hybrid calculations must be understood, in the 
present context by computing the true Type I error rate of a hypothesis test 
with significance level corresponding to some chosen stated Z-values. Cousins 
and Highland [25] recommended such a hybrid for the prototype problem of 
small-count upper limits in which one wishes to incorporate an uncertainty in 
the normalization. The resulting upper limits as applied in HEP (which typi- 
cally take uniform prior for the background mean) appear to be conservative, 
i.e., the Type I error rate of the corresponding hypothesis test is less than 
implied by the quoted Z-value. The basic idea has been extended to problems 
in which the uncertainty is on the mean background, with studies such as 
that of Tegenfeldt and Conrad [26] indicating continued conservatism in the 
results, at least for low Z-values. However, Cranmer has warned [12] that for 
Z = 5, gross over-statement of the significance can result. Thus it is important 
to define the recipe(s) precisely and study the performance. 

For the two prototype problems in Sec. 2, if there is no uncertainty in fi h , 
then /2b = A*b an d the p-value (denoted by pp) can be obtained immediately 
by computing the Poisson probability of obtaining n on or greater counts given 
true mean /i^: 

oo 

Pp = E e ~" b A'b/j! = r(n on , 0, fi h )/T(n on ), (15) 

j=n on 
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here written [27] in terms of the lower incomplete T function, 



X 




t 



n-1 



e~* dt. 



(16) 



o 



With uncertainty in /ib, then with the Bayesian definition of probability (de- 
gree of belief), one can encapsulate the result of the background measurement 
into a pdf p(/ib), assumed to be normalized here. While this is sometimes 
considered to be a prior pdf, Refs. [5,11,25] consider it to be the posterior 
pdf of the background measurement, which is the product of the prior pdf for 
the background measurement as well as its likelihood function from the sub- 
sidiary measurement. In any case, ignoring foundational issues, one can then 
attempt to introduce this uncertainty by averaging p-p over different values of 
/ib, weighted by p(fib), so that the hybrid p- value so obtained is 



While the above approach was viewed by Cousins and Highland as adding some 
Bayesian reasoning to a frequentist p-value, the same mathematical result 
is obtained if one starts from the Bayesian prior-predictive distribution and 
adds on a frequentist-style tail probability calculation to obtain the prior- 
predictive p- value, as advocated by Box [24]; the different points of view simply 
correspond to reversing the order of summing/integrating [5,12,28]. 

4-1 Hybrid recipe using Gaussian likelihood for the Gaussian-mean back- 
ground problem: 

A common assumption in HEP (even when the underlying statistics of the 
measurement of /ib is Poisson) is that of uniform prior and Gaussian likeli- 
hood so that p(/ib) is Gaussian. Then p N denotes the resulting hybrid p-value 
obtained from Eqn. 17, and Z N denotes the Z- value derived from it via Eqn. 3. 
(The subscript N is for "normal", the usage preferred by statisticians.) For the 
results in this paper, we implemented our own program and checked that it 
gave the same results as one of several such programs of which we are aware, 
Ref. [29], except where renormalization caused a difference. 

In typical programs (including ours), the low tail of the Gaussian is truncated 
to avoid negative values of /ib (and the result renormalized) . If this truncation 
is not negligible (so that the renormalization makes a non-negligible differ- 
ence), then conceptual as well as procedural problems arise. Conceptually, the 
problem is a nonzero density for the true background at /if, = 0, despite a 




(17) 
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nonzero measurement. As emphasized in Ref. [25], if truncation makes a ma- 
terial difference, the Gaussian form of the pdf may not be appropriate, and 
a form which goes to zero at the origin (such as log-normal) may be a better 
model; in the next subsection, the Gamma function density arises naturally 
and is well-behaved in this respect. As Cranmer et al. have noted [30], one 
must also understand the Zo\, contours of the background in order to claim 
that Z-value. Thus, a sign that the Gaussian form is almost certainly inade- 
quate is if one finds Z such that Za h > since in this case the computation 
assumes that the high tail of the Gaussian is reliable in a region where the 
corresponding low tail is in the non-physical negative region. 

Furthermore, for Za^ > /ib and large enough /it,, the systematic uncertainty 
<Tb is much larger than the statistical fluctuations in n on (which are of order 
^//Ib). The circumstance in which ones observes high Z is then essentially a 
measurement fi^ which is lower than fi h by Za h . But since fi^ is constrained 
to be non- negative, /ib/o"b becomes an effective upper limit on the observed 
Z, which is only rarely significantly surpassed by anomalously high statistical 
fluctuations in n on . 

For both these reasons, Zo\> > /ib leads to unreliable Z; since a\, = //ib, the 
criterion for unreliable Z is then roughly [30] 

Z > 1/f; (18) 



of course statistical fluctuations superimposed on the mean-background un- 
certainty complicate the argument, but we take Eqn. 18 as a useful rule of 
thumb, and care should be taken as Z approaches 1/f = /ib/o"b- 



4-2 Hybrid recipe using Poisson likelihood for the on/ off problem: Zr = Z^i 



If the underlying statistics of the measurement of /ib is Poisson, then an alter- 
native advocated by one of us some years ago [31], and which is also known 
to the GRA community [32] , again uses the uniform prior, but with the likeli- 
hood function for /i b appropriate to the on/off problem (n fj events observed 
in a Poisson sample from a control region with mean that is r times that of 
the background in the signal region): 

CM = (r/Xb) w , e . (19) 



With uniform prior, the posterior pdf p(/ib) is the same mathematical ex- 
pression, which is a Gamma function. Inserting this into Eqn. 17 results in 
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a p- value denoted by pr (given explicitly in Eqn. C.l) with a corresponding 
Z- value denoted by Z r . 

Remarkably, the values computed for Zr are identical to those computed for 
the frequentist result Z^ of Sec. 3! This is quite surprising even if not un- 
precedented as a mathematical "coincidence" of results from Poisson-based 
Bayesian and frequentist calculations; one can recall for example that upper 
limits with uniform prior (and lower limits with prior) are identical to 
corresponding frequentist results, due to an identity which connect integrals 
of the Poisson probability over p with sums over the observed integers [33]. 
In the present case, after the identity was suggested by numerical results in 
preparation of Ref. [5], an unpublished proof was worked out [34]. Our more 
recent, shorter proof, is presented in Appendix C. The identity of Z? and 
Z Bi guarantees good frequentist properties for hybrid Bayesian-derived Z r . Of 
course there is no such guarantee for hybrid Bayesian-derived Z^. 



5 The profile likelihood method 



The profile likelihood method (based on asymptotic theory and therefore not 
exact for finite sample sizes) has long been widely used for evaluating approxi- 
mate confidence intervals and regions in HEP, notably using the method called 
MINOS in the CERN Program Library package MINUIT [7,35]. (Further dis- 
cussion, with some modifications, is in Refs. [9,10].) In GRA the application 
to the on/off problem by Li and Ma [8] is widely cited. Using the correspon- 
dence between confidence intervals and significance tests discussed in Ref. [2], 
the test of the hypothesis H that p s — at significance level a corresponds 
to a test if p s — is contained in the 100(1 — 2a)% C.L. central confidence 
interval for p s . Thus the profile-likelihood-derived p- value for an obtained data 
set is obtained by first finding the smallest C.L. for which p s = is included 
in the profile-likelihood-derived approximate central confidence interval, and 
then p — (1 — C.L.)/2. To obtain the approximate confidence interval, one 
begins with the likelihood function; for the on/off problem, this is 

n on ! n oS \ 



while for the Gaussian-mean background problem with either absolute or rel- 
ative <7b, it is 

C G = ^ + ^ n °Y (^ b ) * eX p (21) 



2al 
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where as discussed below we have explored the effect of truncating the Gaus- 
sian pdf in /t b and renormalizing prior to forming C G . 



Using either £ P or Cq, one obtains the log-likelihood ratio 



A(/x B ) 



C(jl s ,jl h (/i s )) 



(22) 




where /t s and /t b are the maximum-likelihood estimates of /i s and /ib, respec- 
tively, obtained by minimizing the appropriate likelihood function with respect 
to both fj, s and /i b , and /t b (/i s ) is the result of minimizing the likelihood func- 
tion only with respect to /ib, left as a function of /i s . The log-likelihood ratio in 
Eqn. 22 has one free parameter, so under regularity conditions and in the limit 
of large sample counts n tot , Wilks's asymptotic theorem [36] says that under 
the null hypothesis, — 21nA(/x s ) is distributed as a chi-square statistic with 
one degree of freedom (d.o.f.). The 100(1 — 2a) % confidence interval would 
therefore be the set of /i s for which 



where F ■} is the inverse cumulative distribution function for the chi-square 
with one d.o.f. The background-only hypothesis H would then be rejected at 
significance level a if the so-formed 100(1 — 2a) % C.L. confidence interval for 
/i s does not contain the value fj, s — 0. 

In the present emphasized to us by Cranmer, the regularity conditions 

of Wilks's theorem are in fact not satisfied since the null hypothesis (/i s = 0) is 
on the boundary of allowed /i s . This affects the lower endpoint of the confidence 
interval and changes the confidence level of the full intervals. However, the 
asymptotic Type I errors associated with the upper endpoint and tail appear 
to be unaffected, and we thus proceed using the nominal results for significance 
claims. As noted above, the p-value is then the smallest value of a for which 
H Q would be rejected. As the chi-square with one d.o.f., is the positive half 
of a Gaussian under an appropriate transformation of variables, the Z-value 
corresponding to the p- value for an obtained data set can be computed directly 
from the likelihood ratio as 



where the likelihood ratio is computed using £ P or £ G , as appropriate for the 
problem. 



21nA(/i s ) <F- 2 1 (l-2a), 



(23) 




(24) 
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For the on/off problem and Cp, the explicit result obtained from Eqn. 24 was 
given by Li and Ma (their Eqn. 17) [8]: 

1 /2 

7 /k ( n rc on (l + r) nog (1 + r) \ 

Z PL = V2 n on m h n fr In . (25) 

V n tot n tot r J 



6 Other Methods for Estimating Z 



Other methods for estimating Z found in the literature are typically of the 
form of the ratio of the inferred signal size to its rms deviation, i.e., Z = s/y/V, 
where in the on/off problem the signal s is estimated by s = n on — fib = 
n n — n s/r, and where V is an estimate of the variance of s. 

One widely used form is 




(sometimes [37] imprecisely called the "signal to noise ratio"). While this ig- 
nores the uncertainty in the background estimate, it is often used for optimiz- 
ing selection criteria, because of its simplicity. 



Occasionally one also sees 
„ s s 



J ssb 



(27) 



Aside from recommending Zp^, Ref. [8] mention this in their Eqn. 11. Our 
experience is that this expression typically results from confusing a test of the 
null hypothesis (/i s = 0) with estimating ji s and its 1-a uncertainty once the 
existence of a signal has been established. For example, if n on = 9 and fi^ = 0.1 
with small uncertainty, then a correct Z-value will be very high, even though 
a estimate of /i s will have a relative uncertainty of roughly 1 / y/n on = 1/3. (If 
there is a paradox due to the notion that the estimate of /i s is "only 3cr from 
zero", it is resolved by carefully considering confidence intervals and noting 
the non-Gaussian behavior.) In another extreme, if a"b is large, Z ss b can badly 
over-estimate the significance. 

Ref. [8] also gives as another example method (their Eqn. 5), 

Km = n on + n off /r 2 , (28) 



(subscript nn for no null) which as the authors note treats n on and n Q fr as 
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independent, and therefore does not consistently calculate V under the null 
hypothesis, p s/p on = r. In fact it biases against signals for r > 1 by over- 
estimating V. In the limit of large r, V nn — > n on = s + p^, where /t D has 
negligible uncertainty. Then using V nn leads to Z ssh , which as noted above is 
not appropriate. 

Ref. [5] has derived a related formula, 

V ho = n oS (l + r)/r 2 , (29) 



(subscript bo for background-only) by using only the off-source counts n g- to 
estimate the mean and variance; while not optimal, it at least is consistent 
with the null. Ref. [8] also provides (their Eqn. 9) 

Vein = (n on + n oS )/r, (30) 



(subscript BiN for Binomial Normal) which better implements the null hypoth- 
esis. It is interesting to note that taking a normal approximation to the bino- 
mial test Zbi (that is, comparing the difference of estimate of binomial param- 
eter from its expected value p , to the square root of its normal- approximation 
variance) yields (n on / n tot — p) / \J p(l — p)/n to t, which can be shown to be iden- 
tical to Z BiN = s/ V^BiN- 

Zhang and Ramsden [6] used a variance-stabilizing transformation to derive an 
asymptotically normal variable with nearly constant variance (their Eqn. 23), 



Zzr — 



r=^ (V^on + 3/8 - ^/(n off + 3/8)/r) . (31) 



The 3/8 speeds convergence to normality from the underlying discreteness. 

One can also calculate a Zp from the Poisson probability p-value in Eqn. 15 
and substituting p h for p b , but such a Z P ignores the uncertainty in p h . Oc- 
casionally one sees substitutions of /i D — > p b + <7b into Eqn. 15 in an attempt 
to incorporate the uncertainty in /t D - 

A different approach, known as the Fraser-Reid method, attempts to move 
directly from likelihood to significance by using a 3rd-order expansion [38,39]. 
The mathematics is interesting, combining two first order estimates (which 
give significance to order 1 / y/n) to yield a 1 / \fn? result . Typically, the first- 
order estimates are of the form of a normal deviation, Z t (like ^bin); and a 
likelihood ratio like Z PL ; of these, the likelihood ratio is usually a better first- 
order estimate. The two are then combined into the third order estimate by a 
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formula such as 

Z = Z PL + ^-\n(Z t /Z PL ). (32) 

Generically, Z t = A/W is a Student t-like variable, where A is the difference 
of the maximum likelihood value of 6 (the parameter of interest) from its 
value under the null hypothesis, and V is a variance estimate derived from 
the Fisher Information d 2 L/d 2 9. The attraction of the method is to achieve 
simple formulas with accurate results. However, the mathematics becomes 
more complex [39] when nuisance parameters are included, as is needed when 
the background is imperfectly known. In the present paper, we do not apply 
this method. 



7 Comparison of results for some example data 

In this section, we illustrate the various methods using several interesting test 
cases from the HEP and GRA literature. The input values and published 
Z-value results are shown in Table 1 in boldface at the top of the table; 
typically in HEP cases, the values reported in the papers are n on , fi^, and 
<7 b , while in GRA, the reported values are n on , n Q s, and r. We also include 
a few artificial cases for further illustration. We take Z Pi = Zp as a reference 
standard because of its frequentist foundation. None of these published Z- 
values differed materially from Z Bi . In the remainder of the table, results 
from the various formulas above are given, and as explained in the caption 
departures from Zp\ highlighted. More detailed results for Zp,i, Z^, and Zpp 
are in Sec. 9. 

There are numerical issues to be faced in evaluation of the more complex 
methods. The Binomial is straightforward in its Beta function representation. 
The Bayes p-value methods may involve an infinite sum, and are touchy and 
slow for large n; Ref. [32] suggests approximating the summation by an inte- 
gral. The Bayes p-value summation results are also sensitive numerically for 
large n; integer-based "exact" calculations become slow (e.g. in Mathematica), 
while floating point algorithms may have convergence difficulties. An alterna- 
tive approach is to leave the pp as a T function ratio and trade an integration 
for the infinite sum. Doing so in the Bayes Gaussian case is less unstable than 
summing, but for large n requires hints on the location of the peak of the 
integrand. 

The method most used in HEP, Zn, produces Z's that are always larger than 
those from Zp\ = Zp. This is confirmed in the wider scan of the parameter 
space described in Sect. 9, and can be understood by the fact that the gamma 
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pdf, for the same inputs as the normal, shifts /ib to higher values and smears 
it more broadly than the normal does, resulting in larger tail probabilities 
and thus smaller Z values. Viewing the calculation as averaging the Poisson 
p- value pp over the posterior for /i h (Eqn. 17), the shorter tails of the normal 
compared to the gamma place less weight on the larger probabilities (smaller 
p- values) obtained when the off-source measurement happens to underestimate 
the true value of /tb- The difference is most striking for small values of r, that 
is, when the background estimate is performed with less sensitivity than the 
signal estimate; in this case, results in Z differing by over 0.5 units can occur. 

The most common method in GRA, Zpl, also is always larger than Zbi in a 
wide scan of parameter space, but seems less vulnerable to problems at small 
t. As further evident in Sec. 9, the relative size of Z N and Z PL varies with the 
input parameters. 

The variance stabilization method Zzr presented in Ref. [6] does not appear 
to be in general use in GRA, but produces results of similar quality to Z-^i and 
Zpl. These methods agree for iV > 500, where the normal approximations are 
good, even out to 3-6 a tails. 

The "not recommended" methods all produce results off by more than 0.5 for 
several low-statistics cases. Zbin, which approximates Zbi, does best; Z nn is 
indeed biased against real signals compared to other measures, and its alter- 
native Zbo, while curing that problem, overestimates significance as the price 
for its less efficient use of information compared to Z BiN . 

As expected, ignoring the uncertainty in the background estimate leads to 
overestimates of the significance. Z s ^ = s/y/Jty, is much more over-optimistic 
than an exact Poisson calculation of Eqn. 15. The implicit Gaussian approx- 
imation underestimates the Poisson tail at large n on ; there is in addition a 
smaller bias towards Z sh > Z? from ignoring the discreteness of the Poisson 
sum. Any method ignoring background uncertainty overestimates significance, 
particularly for small n to t, or r < 1, where the background uncertainty is 
most important. For s > 0, one can show that Z sh > Z ssh > Z nn and that 
Z sh > Z ho > Z nn . (The best that can be said for Z sh is that it is mostly mono- 
tonic in the true significance, so that when used for a speedy optimization of 
selection criteria with n on varying by an order of magnitude at most, it is not 
too misleading). 

One can also show that Z ho > Z BiN ; that Z nn > Z B in for r < 1, i.e., poorly 
determined background; that Z bo > Z ssh for r > /ib/s, i.e., for well-determined 
backgrounds; and that Zzr < Z^, unless r is very small. Thus most of the 
non-recommended methods over-estimate Z, except for Z nn and Z ss b, which 
are too low for moderate r, and too high for small r. In general, small r (poorly 
measured backgrounds) gives many methods problems; results are generally 



16 



more stable for an adequate control region. 

Of the ad-hoc corrections for signal uncertainty, none are reliable; the "cor- 
rected" Poisson calculation is less biased than the uncorrected, but still widely 
overestimates significance for r < 1. The attempt to include background un- 
certainty with s/VAb + o"b isn't much better than its "un-corrected" version. 

To summarize our provisional conclusions from these examples, most bad ap- 
proximations overestimate significance (the only exceptions are Z mi for r > 1, 
Z ss b, and Poisson with /tb — > /tb+Cb)- Thus, prudence demands using a formula 
with well-understood properties, in order to not overstate the true significance. 
In the next sections, we study the most promising of these methods in detail. 



8 Application of three recipes to the two problems 

For detailed coverage studies, we examine the three recipes for Z-values in 
Sees. 3-5: 

• Z Bi (= Z r ) takes as input n on , n Q s, and r. 

• takes as input n on , /tb, and o" b - 

• Z PL takes either set of inputs, as appropriate for computing either £ P or 
Cg for the problem at hand. 

It is interesting to explore the performance of each of the first two recipes 
not only for the problem for which it was designed, but also (by using the 
"rough correspondences" of Eqns. 6 through 8) for the other problem. (One 
can also imagine studying the performance of Zp^ using the wrong likelihood 
function for the problem at hand, e.g. using C P for the Gaussian-mean back- 
ground problem or vice versa; however, we do not pursue those combinations of 
methods and problems here.) Since there are two cases of the Gaussian-mean 
background problem, each recipe is then applied in three situations: 

(1) On/off problem: One has n on , n off , and r, so Z Bi and Z PL are computed 
immediately. To compute Z N , the inputs are n on ; /t b from Eqn. 6; and a h 
from Eqn. 7. 

(2) Gaussian-mean background problem with exactly known ov One has n on , 
/tb, and (Tb, so Z N and Z PL are computed immediately. For the remain- 
ing inputs required for Z Bi , r is obtained from Eqn. 8, and then n g- is 
obtained from Eqn. 6. 

(3) Gaussian-mean background problem with exactly known relative uncer- 
tainty /: One has n on , /*b, and /, from which <7b is estimated by //i b , and 
then Zn and Zp^ are computed. One can then also proceed to compute 
Z Bi as in the previous case. 
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We emphasize again that only Zpi applied to the on/off problem is guaran- 
teed not to undercover based on the formal theory of statistics. The recipe 
for mixes frequentist and Bayesian statistics even for the Gaussian-mean 
background problem, and when applied to the on/off problem further approx- 
imates the Poisson background as Gaussian. Applying Z^i to the Gaussian- 
mean background problem does the reverse, by approximating the Gaussian 
background as Poisson. As noted, Zpl is an approximation based on an asymp- 
totic theorem. 



9 Frequentist evaluation of the performance of the various recipes 

In the frequentist evaluation of p-values, one considers particular true values 
of the background mean /ib in the signal region and of another parameter 
characterizing the experimental setup, namely r for on/off experiments or / = 
cr h /fi h for the Gaussian-mean background experiments. For each fixed pair of 
such parameters and each recipe, an ensemble of experimental measurements 
is considered appropriate to the relevant problem described above. For each 
set of measurements corresponding to an experiment, one proceeds as follows. 
In evaluating the performance of Z Bi , Z N , and Z PL , Z is computed according 
to a recipe and compared to a value Z c \ aim (e.g., Z c \ aim = 5). In the ensemble 
of experiments, one calculates the fraction of those experiments which obtain 
Z > Zciaim according to the recipe; this is the true Type I error rate for that 
recipe and a significance level corresponding to that value of Z c \ avca . One can 
then substitute this true Type I error rate for p in Eqn. 3 in order to obtain 
the Z- value that we call Z true . 

We note that each recipe implicitly chooses its own ordering of the points in 
the (non,^ofr) space (or equivalently in the (n on , space): contours of equal 
Z in each space will be different for each recipe. If two recipes both faithfully 
provide the significance levels, then as Neyman and Pearson pointed out, to 
distinguish between them one must compare their power for rejecting relevant 
alternative hypotheses. As noted in the Introduction, in this paper we do not 
pursue such considerations of power. 

A recipe is "conservative" and we say that it "overcovers" (borrowing lan- 
guage from confidence intervals) with respect to a particular problem and a 
particular Z c i a[m if the true ensemble Type I error rate is smaller than implied 
(so that Z truc > Z c i aim ). We say that it "undercovers" if the Type I error 
rate is higher (so that Z true < Z claim ). While neither departure from the cor- 
rect Type I error rate is desirable, undercoverage is generally considered to 
be more of a flaw than overcoverage. Of the combinations of problems and 
recipes under consideration here, only the application of Z^i to the Poisson 
on/ off problem is guaranteed by construction to have Z tTUC > Z c \ aira , i.e., not 
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to have under cover age. 



For purposes of illustration, we have selected three values of Z c iaim (1.28, 3, 
and 5), corresponding via Eqn. 3 to p-values of 0.1, 1.35 x 10~ 3 ,and 2.87 x 
1CT 7 , respectively. In order to calculate the Type I error rate, one needs the 
probability of obtaining Z > Z c i aim . Although we compute this probability 
directly, we mention the alternate method of Monte Carlo simulation, which we 
use as a crosscheck for our results. For example, for the on/off problem, given 
/ib, t, and Zdaim, one samples n on and n Q s from the appropriate distributions 
and counts the number of times the recipe yields a value of Z > Z c i aim . While 
this method remains useful as a cross-check, for more efficient evaluation of 
Ztrue, we calculate discrete probabilities directly from the Poisson formula and 
sum them, and evaluate tail integrals of normal probabilities using the error 
function erf, using a binary search to find how much of the tail yields results 
with Z > Z 'claim. Details for each case are described in Appendix D. 

For the on/off experiments analyzed using the Zbi recipe, the results are dis- 
played in Figs. 1 through 3. Each plot corresponds to a particular value of 
^ciaim, and for each point (r, /i b ) chosen on a fine grid of 50 by 50 points 
Z truc — Zdaim is indicated. As with all these figures, the right plot is a zoomed- 
in version of the left. The value indicated in each pixel is calculated using the 
(r, /ib) of its lower left corner. As expected from the construction, Z trnc > Z c i aim 
everywhere; the overcoverage is significant for small values of counts, where the 
discreteness is most relevant, as seen in the lower left corner of the zoomed- in 
version of each figure. This overcoverage could be reduced by using the non- 
standard intervals for the ratio of Poisson means in Ref. [22], but we do not 
pursue that option in this paper. 

At the limit of numerical precision in our implementation, it turns out that the 
result errs in the conservative direction, but of course extreme caution should 
be used to avoid quoting a result badly affected in this way. The highest 
calculated value of Z trnc is nearly 7.6 (corresponding to a p-value of ~ 10~ 14 ) 
due to the machine limit of our implementation of the calculation of Z true from 
the p-value; this can be alleviated by using approximation in Eqn. 4, but we 
do not pursue that option in this paper, and leave blank those regions in the 
plot where the associated p-value is less than ~ 10~ 14 . 

When using the Z^ recipe to analyze the on/off experiments (Figs. 4 through 
6), there is a large region in which the method undercovers (by as much as two 
units of Z at very low r) with the extent of the region depending on Z claim . 
This is in accord with Cranmer [12], who, using the Monte Carlo method, 
finds for a specific case (/ib =100, r =1), that the Z^ recipe undercovers for 
■^ciaim = 5, with a Type I error rate corresponding to Z true = 4.2. Again, there 
is overcoverage due to discreteness at small values of /ib and r. 
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The results of using the profile likelihood method to analyze the on/off ex- 
periments are shown in Figs. 7 through 9. There is slight undercoverage over 
much of the parameter space, by at most half a unit of Z or so. As the true 
parameters and r move away from the origin, initially there is overcoverage 
caused by discreteness, giving way to the region of largest undercoverage for 
Zpl, which then becomes only slight undercoverage as near- asymptotic perfor- 
mance is reached. At the point considered by Cranmer [12] of /ib =100, r =1, 
we calculate Z tTUC = 4.99, in good agreement with the result from his MC 
method of Z true = 5.0. For small /ib and large r (with the qualifiers small and 
large becoming stricter for increasing Z claim ), the nominal coverage is achieved. 

For the Gaussian- mean background problem with exactly known <7b, the re- 
sults are in Figs. 10 through 12 when analyzed with Z Bi ; in Figs. 13 through 15 
when analyzed with Z N ; and in Figs. 16 through 18 when analyzed with Z PL . 
Z Bi overcovers everywhere, quite severely for the larger values of / considered; 
this is an effect of small estimates of /ib leading by the rough correspondence of 
Eqn. 8 to underestimates of the shape-controlling parameter r = ft^/cr^, and 
thus to an overly broad and shifted gamma distribution which in turn leads to 
estimated tail probabilities which are inappropriately large. Z N provides slight 
over-coverage and no undercoverage for Z daim = 1.28 and Z claim = 3, but it 
undercovers for for Z c iaim = 5 at larger values of / and /ib- For the largest 
values of / in Fig. 15, the reduction in undercoverage is an artifact of using 
the truncated Gaussian model for the uncertainty in the mean background, 
as the condition of Eqn. 18 comes into play. Z PL has good coverage over the 
entire parameter space shown, with some effect of discreteness observable. 

For the Gaussian-mean background problem with exactly known relative un- 
certainty /, the results are in Figs. 19 through 21 when analyzed with Zbi] in 
Figs. 22 through 24 when analyzed with Z^; and in Figs. 25 through 27 when 
analyzed with Z PL . Both Z Bi and Z N give good coverage for small values of / 
and small /ib, but both undercover for large regions of the parameter space, 
with Zbi performing slightly better in some regions. The undercoverage of Z P \ 
is an effect of small estimates of /ib leading by the rough correspondence of 
Eqn. 8 to overestimates of the shape-controlling r = 1/ (f 2 fi h ), and thus to an 
overly narrow gamma distribution, which in turn leads to estimated tail prob- 
abilities which are inappropriately small. For Zp^, the region of good coverage 
is smaller in /ib and / than for either of Z P \ or Zn, but like the latter two, the 
profile likelihood method also undercovers for a large part of the parameter 
space for this problem. 

In all of the results shown for Z PL for the Gaussian-mean background problem 
(Figs. 16 through 18 and Figs. 25 through 27), we assume (as we believe to be 
common practice) that the experimenter is truncating the Gaussian pdf for 
/tb at zero, i.e., set P(/tb|/ib) = for /ib < and renormalized. This results in 
a denominator for C G which depends on /i b , and the determination of Z PL is 
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performed numerically. As with the discussion of Gaussian truncation above 
for p(/ib), if this ad hoc procedure makes any material difference, one should 
explore other functional forms. As a check, we removed the truncation, i.e., 
used Eqn. 21 as it is written. The only perceptible difference is in Figs. 16 
through 18, where the slight undercoverage at / > 0.1 disappears. 



10 Conclusion 

As seen in these simple prototype problems, naive use of a recipe for including 
systematic errors can lead to significant departure from the claimed Z . For a 
true on/off problem (sideband estimate of background in a binned analysis), 
-Zei = Zy avoids undercoverage by construction, but can be quite conservative 
for small numbers of events, at least when the standard intervals for ratio 
of Poisson means are used. Since undercoverage is usually considered to be 
worse than overcoverage, we recommend Z^ be considered for general use in 
this problem; for a range of values, it is conveniently implemented in ROOT, as 
illustrated in Appendix E. However, one should be aware of the overcoverage 
with small numbers of events, and perhaps consider use of alternative intervals 
for the binomial parameter or the ratio of Poisson means. Consistent with long 
experience in HEP and GRA and as noted by Rolke et al. [9,10], the profile 
likelihood-derived Zpl provides a strikingly good approximation in most of 
the parameter space, with at most modest under-coverage; thus Zp^ should 
also be routinely calculated, especially given the easy use of the formula of Li 
and Ma, Eqn. 25. 

For the Gaussian-mean background problem, Z^ works as well as or better 
than in much of the space; for extremely small uncertainties on a large 
mean background, the implementation in ROOT can be supplemented using 
Ref. [21]. The profile likelihood method performs extremely well for exactly 
known <7b For the case of exactly known relative /, all three methods have 
severe under-coverage for high values of Z c i a[m and / > 0.1. Since Z^ and Z^ 
are not well-founded for the Gaussian-mean background problem, and since 
the profile likelihood is based on asymptotic theory, checks of coverage in the 
region of application are essential. 

This paper explores only three recipes for two simple problems; of course, it is 
of interest to extend the studies to other recipes and more complex problems. 
For example, if the background in the signal region has several components, 
each estimated in a separate subsidiary experiment, one can attempt to sum- 
marize this information approximately and apply single-components methods. 
(One could try both an approximate Z^ and a scaled Z? where the scaling 
reflects the ratio / = Ob/ '/■%•) Zpl can be extended to likelihood functions 
describing all components. As problems become more complex, exact cover- 
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age by construction is not likely to be achieved, since even when a full-blown 
Neyman construction is feasible (guaranteeing no under cover age), it typically 
leads to overcoverage. When approximations such as combining background 
components are made, one should check the coverage with a full simulation 
reflecting the individual components. 

As Monte Carlo simulation or numerical integration is often used even for the 
simplest problems, the fact the Zpl has the simple expression in Eqn. 25 is 
extremely useful both for checking the results of a simulation, or for providing 
a speedy evaluation (e.g. in GRA when data from many segments of the sky 
must be monitored in real time). While for some parameters evaluation of 
encounters numerical problems, its expression in terms of the incomplete beta 
function is also quite convenient. 

All of these issues become even more severe as Z values as high as 5 or even 
higher are sought or quoted, as is common in high energy physics. The im- 
plied tail probability of 2.87 x should be used with caution, as it can 
be extremely sensitive to underlying assumptions. While this paper explores 
the coverage assuming that the model is correct, for high Z values one is of 
course also susceptible to modeling errors, for example non-Gaussian tails in 
the uncertainties. 



Acknowledgements 

We thank Kyle Cranmer and Luc Demortier for numerous enlightening discus- 
sions, pointers to references, and for insightful comments on earlier versions 
of this work. J.L. wishes to thank LANL for hospitality and financial support 
during his sabbatical; Tom Loredo for Ref. [6]; and James Berger for hos- 
pitality at the SAMSI 2006 Institute, and acknowledges useful conversations 
there with professors John Hartigan and Joel Heinrich, which helped him to- 
ward the proof that Z Bi = Z r . This work was partially supported by the U.S. 
Department of Energy and the National Science Foundation. 



22 



Reference: 


[40J 


Mil 

[41 j 


T/191 

[42 j 


[43] 


\AA~\ 

[44j 


\A /ll 

[44J 


[45J 


[46j 


[47j 


M21 
[48j 


Won 


4 


b 


y 


1 / 


pen 


b / 


onn 
JUL) 




4yo4zb 


O"! "1 O A A O 

ziiy44y 


n s 


5 


1 C 7Q 

lo. 10 


1 7 20 
1 < .00 


Jn 11 

4U.11 


55 


15 


1U 


000^7 
ZSzi 


Af\0 AO A 

4yd4o4 


zob5UUy6 


T 


5.U 


1/1 /I /I 

14.44 


4. oy 


1U.56 


Ji.y) 


U.5 


U.l 


5.yy 


l.U 


11. zl 




1 n 
l.U 


l.O 


Q C 
O.O 


0.0 


97 ^ 
z ( .0 


oU.U 


1 nn n 
1UU.U 


92Q P. 

oOO.b 


A HQ /I 9 A 

4yo4o4 


ziuy / oZ 


s = n on — /tb 


i n 
O.U 


/I 7 
4. 1 


£ 9 
O.Z 


10 9 
lo.Z 


99 
zz.o 


Q7 


1 nn 
1UU 


lo4 


4yy z 


y / 1 / 




n A A 7 
U.44/ 


U.O 


u.y 


U.b 


9 71 

0. /I 


7 7£ 

/5 


91 P. 

ol.o 


2 1 
0.1 


7no /i 
/Uz.4 


/I 99 Q 

4oo.o 


/ = 0"b/Ab 


n AA7 
U.44 ( 


n 9Qi 

U.Zol 


n 997 
U.Zo 1 


n iw 

U.100 


n iqc 


n 9^2 
u.zoo 


n 9.1 p. 

U.olO 


n n9n7 

U.UZU i 


n nm /19 

U.UU14Z 


n nnn9nf; 

U.UUUZUO 


Reported p 




U.UUo 


U.UZ ( 


Z£j-Uo 














Reported Z 




9 7 
Z. 1 


1 Q 

i.y 


a a 
4.0 








K Q 

o.y 


O.U 


R A 


bee conclusion: 






















Zp\ = Zp Binomial 


l.bb 


2. bo 


l.SZ 


A A CI 

4.4b 


z.yo 




or* 
z.zU 


5.yo 


5.U1 


b.4U 


Zn Bayes Gaussian 


1 oo 
1.00 


Z.71 


1 n /i 

l.y4 


4.55 


no 

d.Uo 


Oil 

3.44 


ft nn 

2. y(J 


o.\)6 


5.U2 


b.4U 


7 C Profile, T il^'ln^,^,/-! 

Zpl, L-p rronie L/1k nooci 


i.yo 


9 29 

z.oz 


1 00 

i.yy 


/i 

4.0 / 


Q fl9 

o.Uz 


q n/i 


9 92 
Z.OO 


o.yo 


o.Ul 


cz An 
D.4U 


z/pL, /--g -rronie j_iik nooa 


9 nn 
z.uu 


9 2^ 
Z.Oo 


9 r»9 
z.uz 


/I fi9 
4.0Z 


1 1 n 
0. 1U 


J. 4<5 


9 an 
z. yu 


o.yo 


Pi no 


/in 

D.4U 


■^zr variance stabilization 


i.yo 


Z.OO 


1 as 
i.yo 


/I 99 
4.ZZ 


o.uu 


^ n7 
o.U ( 


9 

z.oy 


K 2fi 
D.00 


O.Ul 


r /in 

D.4U 


Not Recommended: 






















^BiN = S/ V n tot/T 


2.24 


3.59 


z.17 


5.0 1 


011 
3.11 


2.89 


z.lo 


a 1 Q 

o.lb 


5.U1 


l ' At 

b.41 


^nn — 5/ V n on ~+" ^off/ T 


1 .40 


1 on 


1.00 


0.11 


9 29 
z.oz 


92 
O.ZO 


z. oy 


0. o>4 


k ni 

O.Ul 


D.4U 


^ssb — S/ V A*b + S 


1 ^n 

i.ou 


? no 

1 . yz 


1 79 


Q 0/1 

o.ZU 


9 12 

0. 10 


4-oz 


y nn 


£ 22 

0.00 


y ny 
l.U 1 


^ ^7 

0.0 1 




9 r f 1 

z. 14 


nn 

3. yy 


9 19 
2.4% 


0.4 1 


o.OU 


nn 

3. yu 


n9 
3. UZ 


£. 91 

O.ol 


K HQ 

0.U0 


a a 1 
b.41 


Ignore <7b : 






















Poisson: ignore o~b 


9 ns 
z.uo 


9 2/1 
Z.04 


9 1/1 
Z. 14 


A 27 
4.0 ( 


O.OU 


0. 10 


0. 10 


ail 
o-44 


y /jo 


o.oy 


^sb — s l VfJ-b 


J. uu 


119 


Z.O 1 


0.1/ 


1 on 

4-zy 


0. 10 


in nn 
1 u. uu 


R 89 
O.oZ 


y 1 1 
1.11 


o.oy 


Unsuccessful ad hockery: 






















Poisson: fi^ ~^ Ab + c"b 


1.56 


2.51 


1.64 


4.47 


3.04 


4.24 


5.51 


6.01 


6.09 


b.39 


S / VAb + 0"b 


2.^0 


3.72 


2.40 


6.29 


4.03 


6.02 


8.72 


6.75 


7.10 


6.69 



Table 1 

Test Cases and Significance Results. In the top section, the primary input numbers 
from the papers are in boldface, with derived numbers (using Eqns. 5-8) in normal 
font. Zpl is shown for both Cp and Cq regardless of the primary input numbers. 
The test cases are ordered in data counts; [44], [45], and [47] have small values of 
r, troublesome for some methods. Below the top section, Z-values in boldface are 
nearly equal to the reference Zp;, while Z- values in italics differ by more than 0.5. 
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On/off problem, Z g =1 .28 
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Fig. 1. For the on/off problem analyzed using the Z-q\ recipe, for each fixed value 
of r and /Xb, the plot indicates the calculated Z tTUC — Z c \ & \ m for the ensemble of 
experiments quoting a Z c i a im > 1.28, i.e., a p-value of 0.1 or smaller. 



On/off problem, Z b =3 
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Fig. 2. For the on/off problem analyzed using the Z-q\ recipe, for each fixed value 
of r and /it,, the plot indicates the calculated Z tTue — Z c i aim for the ensemble of 
experiments quoting Z c i a i m > 3, i.e., a p-value of 1.35 x 10 -3 or smaller. 
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On/off problem, Z b =5 



On/off problem, Z b =5 




Fig. 3. For the on/off problem analyzed using the Zbi recipe, for each fixed value 
of r and /ib, the plot indicates the calculated Z tTUC — Z c i aim for the ensemble of 
experiments quoting Z c i a ; m > 5, i.e., a p-value of 2.87 x 1CP 7 or smaller. 
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Fig. 4. For the on/off problem analyzed using the Z-^ recipe, for each fixed value 
of r and fit,, the plot indicates the calculated Z tTUC — Z c i aim for the ensemble of 
experiments quoting Z c i aim > 1.28, i.e., a p-value of 0.1 or smaller. 
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On/off problem, Z =3 



On/off problem, Z =3 
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Fig. 5. For the on/off problem analyzed using the Z-^ recipe, for each fixed value 
of r and /ib, the plot indicates the calculated Z tTUC — Z c i aim for the ensemble of 
experiments quoting Z c i^ m > 3, i.e., a p-value of 1.35 x 1CP 3 or smaller. 



On/off problem, Z n =5 



On/off problem, Z n =5 
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Fig. 6. For the on/off problem analyzed using the Z-^ recipe, for each fixed value 
of r and the plot indicates the calculated Z true — Z c i aim for the ensemble of 
experiments quoting Z c \ aim > 5, i.e., a p-value of 2.87 x 10~ 7 or smaller. 
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On/off problem, Z pL =1.28 



On/off problem, Z pL =1.28 
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Fig. 7. For the on/off problem analyzed using the profile likelihood method, for 
each fixed value of r and n^, the plot indicates the calculated Z true — ^ c iaim for the 
ensemble of experiments quoting Z c \^ im > 1.28, i.e., a p- value of 0.1 or smaller. 
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Fig. 8. For the on/off problem analyzed using the profile likelihood method, for 
each fixed value of r and fj.^, the plot indicates the calculated Z true — for the 

ensemble of experiments quoting Z c i a ; m > 3, i.e., a p- value of 1.35 x 10~ 3 or smaller. 
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On/off problem, Z pL =5 
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Fig. 9. For the on/off problem analyzed using the profile likelihood method, for 
each fixed value of r and fib, the plot indicates the calculated Z true — ^ c iaim for the 
ensemble of experiments quoting Z c iaim > 5, i.e., a p- value of 2.87 x 10~ 7 or smaller. 



Gaussian-mean problem (absolute a b ), Z =1.28 



Gaussian-mean problem (absolute a b ), Z Q .=1.28 



o1000 




0.02 0.04 0.06 0.08 0.1 0.12 0.14 



Fig. 10. For the Gaussian-mean background problem with exactly known a^, an- 
alyzed using the Z^i recipe, for each fixed value of / = cr b //i b an d fJ-h, the plot 
indicates the calculated Z true — Z c i aim for the ensemble of experiments quoting 
Zciaim > 1.28, i.e., a p-value of 0.1 or smaller. 
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Gaussian-mean problem (absolute c ), Z =3 
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Fig. 11. For the Gaussian-mean background problem with exactly known <7b, ana- 
lyzed using the Z-q\ recipe, for each fixed value of / = o"b/Vb and Mb> the plot indi- 
cates the calculated Z true — Z^im for the ensemble of experiments quoting Z c i a i m > 3, 
i.e., a p- value of 1.35 x 10 -3 or smaller. 
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Gaussian-mean problem (absolute a ), Z =5 
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Fig. 12. For the Gaussian-mean background problem with exactly known a^, ana- 
lyzed using the Z-q\ recipe, for each fixed value of / = <7b//-*b and /Ub, the plot indi- 
cates the calculated Z true — Z c \ & \ m for the ensemble of experiments quoting Z c i aim > 5, 
i.e., a p- value of 2.87 x 10 -7 or smaller. 
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Gaussian-mean problem (absolute o b ), Z N =1.28 



Gaussian-mean problem (absolute o b ), Z N =1.28 




Fig. 13. For the Gaussian-mean background problem with exactly known <7b, ana- 
lyzed using the recipe, for each fixed value of / = 0"b/Vb and Hb, the plot indicates 
the calculated Z tTue — Z c i a ; m for the ensemble of experiments quoting Z c i a ; m > 1.28, 
i.e., a p- value of 0.1 or smaller. 
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Fig. 14. For the Gaussian-mean background problem with exactly known <7t>, ana- 
lyzed using the recipe, for each fixed value of / = Ob/Vb and fib, the plot indicates 
the calculated Z true — Z c i aim for the ensemble of experiments quoting Z c \ a \ m > 3, 
i.e., a p- value of 1.35 x 10 -3 or smaller. 
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Fig. 15. For the Gaussian-mean background problem with exactly known a^, ana- 
lyzed using the recipe, for each fixed value of / = 0"b/Vb and Hb, the plot indicates 
the calculated Z true — Z c i a i m for the ensemble of experiments quoting Z c i a ; m > 5, 
i.e., a p-value of 2.87 x 10 -7 or smaller. 
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Fig. 16. For the Gaussian-mean background problem with exactly known at,, an- 
alyzed using the profile likelihood method, for each fixed value of / = &b/Hb and 
fib, the plot indicates the calculated Z true — Z c i aim for the ensemble of experiments 
quoting Z c i a i m > 1.28, i.e., a p- value of 0.1 or smaller. 
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Fig. 17. For the Gaussian-mean background problem with exactly known a^, an- 
alyzed using the profile likelihood method, for each fixed value of / = cr^/fit, and 
//b) the plot indicates the calculated Z true — Z c \a,im for the ensemble of experiments 
quoting > 3, i.e., a p- value of 1.35 x 10 -3 or smaller. 



Gaussian-mean problem (absolute a ), Z =5 
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Fig. 18. For the Gaussian-mean background problem with exactly known a^, an- 
alyzed using the profile likelihood method, for each fixed value of / = Ob/Vb an d 
/ib, the plot indicates the calculated Z true — Z c \a,\m for the ensemble of experiments 
quoting Z c i aim > 5, i.e., a p- value of 2.87 x 10 or smaller. 
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Gaussian-mean problem (relative a ), Z =1.28 
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Fig. 19. For the Gaussian-mean background problem with exactly known relative 
uncertainty /, analyzed using the Z-q\ recipe, for each fixed value of / and /ib ; the 
plot indicates the calculated Z true — Z c iaim for the ensemble of experiments quoting 
Zciaim > 1.28, i.e., a p-value of 0.1 or smaller. 
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Fig. 20. For the Gaussian-mean background problem with exactly known relative 
uncertainty /, analyzed using the Zbi recipe, for each fixed value of / and /ib, the 
plot indicates the calculated Z tTue — Z c i aim for the ensemble of experiments quoting 
•^ciaim > 3, i.e., a p-value of 1.35 x 10~ 3 or smaller. 
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Gaussian-mean problem (relative <r b ), Z =5 
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Fig. 21. For the Gaussian- mean background problem with exactly known relative 
uncertainty /, analyzed using the Z-q\ recipe, for each fixed value of / and /ib, the 
plot indicates the calculated Z true — Z c iaim for the ensemble of experiments quoting 
•^ciaim > 5, i.e., a p- value of 2.87 x 10~ 7 or smaller. 
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Fig. 22. For the Gaussian- mean background problem with exactly known relative 
uncertainty /, analyzed using the Zn recipe, for each fixed value of / and //b, the 
plot indicates the calculated Z tTue — Z c i aim for the ensemble of experiments quoting 
■Zciaim > 1.28, i.e., a p-value of 0.1 or smaller. 
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Gaussian-mean problem (relative a b ), Z N =3 



Gaussian-mean problem (relative a b ), Z N =3 




Fig. 23. For the Gaussian- mean background problem with exactly known relative 
uncertainty /, analyzed using the Zn recipe, for each fixed value of / and /i^, the 
plot indicates the calculated Z true — Z c iaim for the ensemble of experiments quoting 
•^ciaim > 3, i.e., a p-value of 1.35 x 10~ 3 or smaller. 
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Fig. 24. For the Gaussian-mean background problem with exactly known relative 
uncertainty /, analyzed using the Zn recipe, for each fixed value of / and //b, the 
plot indicates the calculated Z tTue — Z c i aim for the ensemble of experiments quoting 
Z c iaim > 5, i.e., a p- value of 2.87 x 1CP 7 or smaller. 
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Gaussian-mean problem (relative o b ), Z =1 .28 
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Fig. 25. For the Gaussian- mean background problem with exactly known relative 
uncertainty /, analyzed using the profile likelihood method, for each fixed value 
of / and fib, the plot indicates the calculated Z true — Zciaim for the ensemble of 
experiments quoting Z c i aim > 1.28, i.e., a p- value of 0.1 or smaller. 
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Fig. 26. For the Gaussian-mean background problem with exactly known relative 
uncertainty /, analyzed using the profile likelihood method, for each fixed value 
of / and the plot indicates the calculated Z tTUC — Z c i aim for the ensemble of 
experiments quoting Z c i a ; m > 3, i.e., a p-value of 1.35 x 1CP 3 or smaller. 
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^1 000 




Fig. 27. For the Gaussian-mean background problem with exactly known relative 
uncertainty /, analyzed using the profile likelihood method, for each fixed value 
of / and the plot indicates the calculated Z tTUC — ^ c iaim f° r the ensemble of 
experiments quoting Z c i a im > 5, i.e., a p-value of 2.87 x 1CP 7 or smaller. 



37 



A Notation 



Table A. 1 defines the variables used in this paper. 
Table A.l 



Symbol 


definition 




total observed in "off" (background) region 


"on 


total observed in "on" (signal) region 


"-tot 


"on + "off 


Ms 


true signal mean in "on" (signal) region 


Mb 


true background mean in "on" (signal) region 


fa 


estimate of background mean in "on" (signal) region 


0"b 


uncertainty on estimate fa in "on" region 


S 


estimate of signal events in the "on" region = n on — fa 


f 


relative uncertainty on fa; ah/ pb 


Mon 


true total mean in signal region = [i s + 


Moff 


true background mean in "off" (background) region 


Mtot 


true total mean in "on" plus "off" regions = fj, on + /x ff 


r 


ratio of background means in "off" and "on" regions: fi Q s/pb 


A 


ratio of Poisson means /j, Q g / /i on 


P 


binomial parameter // n/Mtot 



B Derivation of approximate tail area of normal distribution 

With <&(Z) — 1 — p defined in Eqns. 1-2, we derive Eqn. 4 by starting with 
the large- Z expansion of 1 — <&(Z), the cumulative distribution of the normal 
density (f>(Z), given as asymptotic expansion 26.2.12 in Ref. [49]: 

P = 1-*(Z)«^(1--1 + ...) (B.l) 
Then we follow Ref. [5] by neglecting the higher order terms, 
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\n(pV2^) = -Z 2 /2 - In Z. (B.3) 

Denning u by further manipulation of the left side, 

u = -2 \n(pVZrr) = Z 2 + \n Z 2 , (B.4) 

and substituting Z 2 -u into In Z 2 by initially neglecting this term, we obtain 
Z 2 = u - InZ 2 « u - lnw. (B.5) 

Thus 

Z ~ Vu - \nu. (B.6) 
C Proof of the identity of Z B i and Z r 



The essence of the proof is to tie together two established identities. The first 
is a "parameter mixing" [27,50] identity that relates the negative binomial 
distribution to a mix of Poisson distributions with mean drawn from a Gamma 
density (as found in pr); the second connects binomial tail probabilities (as 
found in pei) to negative binomial tail probabilities. 

We start by combining Eqns. 15, 17, and 19, arriving at the expression (equiv- 
alent to Eqn. 6 of Ref. [32], also in Sec. Ill of Ref. [5]), using a — 1/r, 

^ gi(l/(l + a)) 1+ J +n °«U + n oS )\ 
Pr~ 2^ i • ^-1) 



Substituting for a in terms of p — 1/(1 + r) — a/(l + a): 
£ ^(l-rt'^O +r,,,)! 

j=non .7- "off- 

oo 

= ^ NBi(iV on = j; N o{i = n oS + 1, prob succcss = 1 - p). (C.3) 

j = rion 



As indicated, the term summed can be identified as the negative binomial 
[27,50] probability NBi for observing j counts on-source (confusingly corre- 
sponding to number of "failures" in the usual exposition of NBi) in less time 



39 



than it takes to observe exactly n g + 1 counts off-source (number of "suc- 
cesses" N Q ff), where as above, p = p m J p tot is the ratio of the mean numbers of 
counts on-source to the total mean on and off source (and hence 1 — p corre- 
sponds to the usual probability for "success" in NBi). Thus, in more compact 
notation, 

p T = NBi(iV on > n OQ - l\N oS = n oS + 1), (C.4) 
thus completing the first main identity. 

Now the probability for more than k on-source counts while waiting for m off- 
source counts is precisely equal [27] to the probability of finding fewer than m 
off-source counts in exactly k + m total counts for the same ratio of (on/total) 
means p. This relates a negative binomial tail probability to the (ordinary) 
binomial tail probability: 

NBi(iV on > k\N oS =m) = Bl(N oS < m\k + m). (C.5) 

The left hand side of this identify matches the right hand side of Eqn. C.4 for 
k = n on — 1 and m = n s + 1, so Eqn. C.4 becomes 

p r = Bi(N oS < n oS + l|n on + n off ) = Bi(N oS < n oS + l\n tot ). (C.6) 

Since the sum of counts is constrained in the Binomial probability, the latter 
expression can be re-written in terms of complementary outcomes: 

p r = Bi(iV on > n on |n to t)- ( C - 7 ) 

Comparing with Eqn. 13 confirms that psi = Pr and hence = Zr- This 
relation was first proved by other methods in 2003 [34] , but the present proof 
seems to be more illuminating. 



D Details of the Calculations of Z true 

This Appendix provides more details of the calculation of Z true in Sec. 9. 

D.l Details of calculation of Z truc for the on/ off problem 

For each point in (ph,T~) space for which one calculates Z tIUC , one has a 
plane of discrete points (n of[ ,n on ), with each point having the joint proba- 
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bility P(n on \iib) ■ P(n g\Tiib), where P is the Poisson probability. The joint 
probabilities of all the points {n s,n on ) for which the recipe studied returns 
Z > Zciaim are summed to obtain the Type I error rate for a test with the 
implied significance level. Navigating in the plane of (n fj,n on ) is facilitated 
making use of Eqn. 10 and thus considering lines of constant n to t, along which 
binomial probabilities are calculated to obtain efficiently the contour bounding 
the region with Z > Z c i aim . 



D.l.l The Z Bi recipe applied to the on/off problem 

In this simplest case, r is fixed and given, so for each (n Q fj, n on ) point, pei and 
Zbi are calculated from Eqns. 14 and 3, and compared to Z c iaim- 



D.1.2 The Zn recipe applied to the on/ off problem 

Starting with n on , n Q s, and r, one obtains fi^ from Eqn. 6), a h from Eqn. 7, 
and proceeds as usual. (/ is thereby equal to 1/ \fn R.) 



D.1.3 The profile likelihood method applied to the on/ off problem 

We proceed exactly as in Sec. D.l.l, but instead of calculating a p-value and 
then Z at each point (n s,n on ), Z is calculated directly from Eqn. 24. 



D.2 Details of calculation of Z truc for the Gaussian-mean background problem 

For each point in (/, /z b ) space for which one calculates Z true corresponding to 
a particular Z claim , one considers all values of n on , and for each value of n on one 
finds (via a binary search) the critical value of fib such that Z = Z c iaim- Then 
the Type I error rate is the sum of the products of the probability of obtaining 
each n on and the Gaussian tail probability for fi^ such that Z > Z c i aim for that 
n on . The tail probability is obtained using the error function and true values 
of /i b and o- b = ffi h . 



D.2.1 The Z N recipe applied to the Gaussian-mean background problem 

In the case where <7b is assumed known, Z N is directly computed; in the case 
where / is known, a\, is first estimated by ffi\>. 



41 



D.2.2 The Z Bl recipe applied to the Gaussian-mean background problem 



This again uses the rough correspondence of Eqn. 8. In the case where o"b is 
known exactly, then for each n on , one searches for /tb such that when /tb is 
used in Eqns. 8 and 6 to obtain r and n a s, the resulting Z Bl from Eqns. 14 
and 3 is equal to Z c iaim- In the case where / is known exactly, as usual one 
first estimates o"b by //tb and then in the same way finds the critical value of 
/tb- (I.e., one computes r = fib/ifM 2 and n s = fi^r, from which one obtains 

z Bi .) 

D.2.3 The profile likelihood method applied to the Gaussian-mean background 
problem 

Once more using the rough correspondence of Eqn. 8, we search for the fib 
such that when calculating r and n g as in Sec. D.2.2, the resulting Z using 
Eqn. 24 is equal to Z c iaim- 



E Implementation of Z Bi in ROOT 

As noted in Sec. 3, the ratio in Eqn. 14 is implemented in ROOT [20] following 
the algorithm in Numerical Recipes [19]; therefore one simply calls Betalncom- 
plete to obtain the p-value, and then Erflnverse to convert it to Z according 
to Eqn. 3. 

For the simple on/off problem with n on = 140, n oS = 100, and r = 1.2, the 
ROOT commands are: 

double n_on = 140. 
double n_off = 100. 
double tau = 1.2 

double P_Bi = TMath: :BetaIncomplete(l ./(l .+tau) ,n_on,n_of f +1) 
double Z_Bi = sqrt (2) *TMath :: Erf Inverse (1 - 2*P_Bi) 

yielding p Bi = 4.19 x 10" 5 and Z Bi = 3.93. 

In order to apply Z Bi to the Gaussian-mean background problem, consider 
for example the observations n on = 140 and /tb = 83.3 ± 8.33. Using the 
correspondence in Eqn. 8 to obtain r, and then Eqn. 6 to obtain n g = /tb t, 
the ROOT commands are similarly 

double n_on = 140. 
double mu_b_hat = 83.33 
double sigma_b = 8.333 
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double tau = mu_b_hat/ (sigma_b*sigma_b) 
double n_off = tau*mu_b_hat 

double P_Bi = TMath: :BetaIncomplete(l ./(l .+tau) ,n_on,n_of f +1) 
double Z_Bi = sqrt (2) *TMath :: Erf Inverse (1 - 2*P_Bi) 

The result in this example is then identical to the on/off example within 
round-off error, since the chosen fi h and <r b were chosen to reproduce the same 
r and n oS . 

As <7b becomes small, r and n ff become large, so ironically this implementation 
encounters numerical trouble for small uncertainty on the background (and in 
particular background known exactly). For such small errors on background, 
neglecting them using Eqn. 15 seems reasonable but should be studied fur- 
ther. The implementation of the incomplete beta function by Majumder and 
Bhattacharjee [21] used for the coverage calculations in Sec. 3 provides some 
expanded capability. Beyond that, one may consider using the asymptotic 
formulas in Eqn. 4. 
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